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Abstract. In dense Erdos-Renyi random graphs, we are interested in the events where 
large numbers of a given subgraphs occur. The mean behaviour of subgraph counts is 
known, and only recently were the related large deviations results discovered. Conse- 
quently, it is natural to ask, what is the probability of an Erdos-Renyi graph containing 
an excessively large number of a given subgraph? Using the large deviation principle, 
we study an importance sampling scheme as a method to numerically compute the small 
probabilities of large triangle counts occurring within Erdos-Renyi graphs. The expo- 
nential tilt used in the importance sampling scheme comes from a generalized class of 
exponential random graphs. Asymptotic optimality, a measure of the efficiency of the 
importance sampling scheme, is achieved by the special choice of exponential random 
graph that is indistinguishable from the Erdos-Renyi graph conditioned to have many 
triangles. We show how this choice can be made for the conditioned Erdos-Renyi graphs 
both in the replica symmetric phase and also in parts of the replica breaking phase. 
Equally interestingly, we also show that the exponential tilt suggested directly by the 
large deviation principle does not always yield an optimal scheme. 



1. Introduction 

In this paper we study the use of importance sampling schemes to numerically estimate 
the probability that an Erdos-Renyi random graph contains an unusually large number of 
triangles. Consider an Erdos-Renyi random graph Q n<p on n vertices with edge probability 
p G (0, 1). For a simple graph X on n vertices, let T(X) denote the number of triangles 
in X. For p fixed, one can show that E[T(C/ njP )] ~ (j^P 3 as n — )• oo. For t > p, what is 
the probability 

^ = p(V(s n , p ) > r\A (i.i) 

that Gn tP has an atypically large number of triangles? The last few years have witnessed a 
number of deep results in understanding such questions on upper tails of triangle counts, 
along with more general subgraph densities (see e.g., [3,6-9,13,16]). In the dense graph 
case, where the edge probability p stays fixed as n — > oo, [7] derived a large deviation 
principle (LDP) for the rare event {T(Q njP ) ^ (3)^}) showing that for t within a certain 
subset of (p, 1], 

"\-3\ = pv „ f_Jr mfl , n(n -1/2, 



T(g n , P )> ( 3 )n =exp(-n%(t)(l + 0(n^))) (1.2) 
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where the rate function I p {t) is given by 

I p (t) = ~ ( t log - + (1 - t) log i— 1") . (1.3) 
2 \ p 1 — p J 

More recently [8] showed a general large deviations principle for dense Erdos-Renyi graphs, 
using the theory of limits of dense random graph sequences developed recently by Lovasz 
et al. [3,14,15]. When specialized to upper tails of triangle counts, the large deviation 
principle shows that for the range of (p,t) considered in (1.2), the Erdos-Renyi graph Q n ^ p 
conditioned on the rare event {T(Q niP ) (3)^} is asymptotically indistinguishable from 
another Erdos-Renyi graph Q n ^ with edge probability t, in a sense that the typical graphs 
in the conditioned Erdos-Renyi graph resembles a typical graph drawn from Q n t when n 
is large. (Asymptotic indistinguishability is explained more precisely at (2.11).) While 
this seems plausible for any t > p since E[T(Q n> t)] ~ (3)^ as n — >• 00, it is not always 
the case. Depending on p and t, it may be that the graph Q n>p conditioned on the event 
{T(Q n;p ) ^ G$)^ 3 } tends for form cliques and hence does not resemble an Erdos-Renyi 
graph. When the conditioned graph does resemble an Erdos-Renyi graph, we say that 
(p, t) is in the replica symmetric phase. On the other hand, when the conditioned graph 
is not asymptotically indistinguishable from an Erdos-Renyi graph we say that (p, t) is in 
the replica breaking phase. (See Definition 2.2.) 

Our approach to this problem is from a computational perspective: we study the use 
of importance sampling schemes for numerically estimating the probability /%, and also 
determine the schemes that perform optimally for those {p, t) in the replica symmetric 
phase as well as in a subset of the replica breaking phase. 

The exponential decay of the probability of the event of interest makes it difficult to 
estimate this probability even for moderately large n. Direct Monte Carlo sampling is 
obviously intractable. The central strategy of importance sampling is to sample from a 
different probability measure, the tilted measure^ under which the event of interest is no 
longer rare; one obtains more successful samples falling in the event of interest but each 
sample must then be weighted appropriately according to the Radon-Nikodym derivative 
of the original measure against the tilted measure. Importance sampling techniques have 
been used in many other stochastic systems, such as SDEs and Markov processes and 
queuing systems, see e.g [2,4,10,12,20] and the references therein. In particular, when a 
large deviations principle is known for the stochastic system, the tilted measure commonly 
used is a change of measure arising from the LDP. However, not every tilted measure asso- 
ciated with the LDP works well. It is well known that a poorly chosen tilted measure can 
lead to an estimator that performs worse than Monte Carlo sampling, or whose variance 
blows up [11]. Thus, a careful choice of tilted measure is of utmost importance. 

Given (p, t), the <5 n) t-measure works as a tilted measure by ramping up the edge proba- 
bility of the samples; we shall refer to Q Ut t as an edge tilt. As we will see later on, even when 
the LDP suggests that Q n j is asymptotically equivalent to the conditioned Q n>p graph, the 
edge tilt is not necessarily a good tilted measure for estimating the probability fj, n . It turns 
out that the class of measures associated with the Erdos-Renyi graphs is too limited, so we 
must broaden the class to consider the class of exponential random graphs. Exponential 
random graphs are generally defined via a Gibbs measure. In the context of estimating 
rare events for triangles, one need only consider the Gibbs measures involving only edge 
and triangle counts. Hence, consider the exponential random graphs Qn'^' a defined via 
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the Gibbs measure, Q n = Q n on the space of simple graphs on n vertices, where 

a / 3 \ 1— a 

QnPQ oc e H[X \ where H{X) = hE(X) + - ( y J T(X) Q 

with parameters /i € M and /3,a > 0. E(X) is the number of edges in graph X. Given 
(p, t), a special choice of Gibbs measure Qn' 3 ' is what we will call a triangle tilt, which 
works by ramping up the probability of triangles. We defer the full definition of the 
triangle tilt to Defintion 2.12 in Section 2.2. We shall show that in a number of different 
regimes, the triangle tilt is the best possible tilt, in an asymptotic sense. In this sense, 
the class of exponential random graphs is sufficiently rich to ensure the existence of an 
optimal triangle tilt even for a subset of the replica breaking phase. 

To understand why the class of exponential random graphs is the right class to consider, 
we make a digression to mention the connection between exponential random graphs and 
the conditioned Erdos-Renyi graphs. Exponential random graphs have been studied in 
[1,5,16,17]. The "classical" exponential random graphs with a = 1 and its connection 
to conditioned Erdos-Renyi graphs was initially observed by Chatterjee and Dey [7] when 
proving the large deviations principle (1.2), and it was further developed by Lubetzky 
and Zhao [16] for a > 0. An interesting observation in [16], for the case when (p,t) 
belongs to the replica symmetric phase, is the connection between the free energy of the 
Gibbs measure and the derivative of the rate function. This connection leads to the 
following duality relationship between certain parameters (h, /3, a) of the Gibbs measure 
and the parameters (p, t) of the conditioned Erdos-Renyi graph: for (p, t) that is replica 
symmetric and a G [2/3,1], the typical exponential random graph resembles the the 
conditioned Erdos-Renyi graph if h = log and the free energy of the Gibbs measure 



ln^' a , expressed in a variational formulation 



lim log e H( - x ^ = sup 



t u *<*-I p (u) 





where I p {t) is the rate function at (1.3), is maximized at t. 

One of our main results, Theorem 2.5, extends this duality into the replica breaking 
phase, and generalizes the way of characterizing when an exponential random graph re- 
sembles an conditioned Erdos-Renyi graph. The gist of Theorem 2.5, and its immediate 
consequence is described, heuristically, as follows: 

Fix p G (0,1) and t G [p, 1], and let h p = logy^. Suppose there exists 
(3 > and a G [0, 1] such that 



t = arg sup 



(1.4) 



where the function (fi p (u) is a rate function (see (2.24)). Then the expo- 
nential random graph Gn p,l3 ' a and the Erdos-Renyi graph Q n:P conditioned 
on {T(Q ntP ) ^ (3)^} are asymptotically indistinguishable. 

Thus, Theorem 2.5 provides a way to characterize the asymptotic behaviour of the 
conditioned Erdos-Renyi graph by that of an exponential random graph. 

Apart from its independent interest, Theorem 2.5 and the variational form (1.4) is the 
basis for choosing the parameters for the Gibbs measure that defines the triangle tilt. In 
essence, the triangle tilt can be defined when there exists (/i p ,/3,a) for which (1.4) holds, 
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which is the case for the replica symmetric phase and at least a nontrivial subset of the 
replica breaking phase. 

Returning to the question of the efficiency of an importance sampling scheme, one 
measure of the efficiency is through the magnitude of the variance of the importance 
sampling estimator. In the presence of a large deviation principle, we appeal to the notion 
of asymptotic optimality, which is the property that the importance sampling estimator 
has the smallest attainable variance, as n — > oo. (See Definition 2.3.) 

Our main results pertain to the asymptotic optimality or non-optimality of certain 
importance sampling estimators. In Proposition 3.1 we prove a necessary condition for 
asymptotic optimality when the tilt is based on an exponential random graph: the ex- 
ponential random graph must be asymptotically indistiguishable from the conditioned 
Erdos-Renyi graph. In particular, if (p, t) belongs to the replica symmetric phase, then 
the necessary condition is that the exponential random graph is indistinguishable from 
Q n j- On the other hand, Proposition 3.2 shows that this is not a sufficient condition for 
asymptotic optimality: there is a subregime of the replica symmetric phase for which the 
edge tilt produces a suboptimal estimator. It is interesting to note that although the 
LDP suggests that Q n ^ is the typical behaviour of the conditioned ER graph in the replica 
symmetric phase, directly using Q n ^ as the importance sampling tilt does not necessarily 
give an optimal estimator. Instead, we must be careful to use a tilt that not only is in- 
distinguishable from the conditioned Erdos-Renyi graph, but also gives an asymptotically 
optimal estimator. It turns out that the triangle tilts are the appropriate tilts to use, and 
this fact is the statement of our main optimality result, which we state here. 

Theorem 1.1. Given (p,t), denote h p = log^E—. Suppose there exists a triangle tilt 

Qn P '^'° with parameter a > corresponding to (p,t), as defined in Definition 2.12. Then 
the importance sampling estimator based on the tilted measure Q^ p '^ ,a is asymptotically 
optimal. 

Organization of the paper: We start by giving precise definitions of the various 
constructs arising in our study in Section 2. This culminates in Theorem 2.5 that char- 
acterizes the limiting free energy of the exponential random graph model. The rest of 
Section 2 is devoted to drawing a connection between the exponential random graph and 
Erdos-Renyi random graph conditioned on an atypical number of triangles, leading to the 
derivation of the triangle tilts. Section 3 discusses and proves our main results on asymp- 
totic optimality or non-optimality of the importance sampling estimators. In Section 4, 
we carry out numerical simulations on moderate size networks using the various proposed 
tilts to illustrate and compare the viability of the importance sampling schemes. 

Acknowledgement This work was funded in part through the 2011-2012 SAMSI Program 
on Uncertainty Quantification, in which each of the authors participated. JN was partially 
supported by grant NSF-DMS 1007572. SB was partially supported by grant NSF-DMS 
1105581. 

2. Large deviations, importance sampling and exponential random graphs 

A simple graph X on n vertices can be represented as an element of the space Q n = 
{0, 1}( 2 ). A graph X G f2 n will be denoted by X = (-Xij)l<i<j<n with the entry Xij 
indicating the presence or absence of an edge between vertices % and j. For a given edge 
probability p G [0, 1], an Erdos-Renyi random graph Q ntP is a graph on n vertices such that 
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any edge is independently connected with probability p. We shall use IPn,p to represent 
the probability measure on VL n induced by the Erdos-Renyi graph G n ,p- The probability 
of a fixed graph X under the measure P n ^ p can be explicitly computed as 

F(G n , p = X)= F n , p (X) = Y[p x ^(l - p) 1 -^ = (1 - pp e^ x ) (2.1) 

i<j 

where h p := logy^, and E{X) := J2i<j Xij is the number of edges in X. Let T(X) 
denote the number of triangles in graph X: 

T(X) = ^2 XijXjkXik. 

Also let the event W njt = {X € Q n \ T(X) ^ (™)t 3 } denote the upper tails of triangle 
counts. 

Importance sampling. If {X k } k *L l C fl n is a sequence of Erdos-Renyi random graphs 
generated independently from P njP , then for any integer K ^ 1, 

1 K 

M K = -^w n M k ) 



k=l 



is an unbiased estimate of fj, n . By the law of large numbers, Mk — > fJ- n with probability 
one as K — > oo. Although this estimate of fi n is very simple, the relative error is 

y/Vai(M K ) _ V/^n - (Vnf 



which scales like (K^n)' 1 / 2 as fi n — > 0. Hence the relative error may be very large in 
the large deviation regime where fi n « 1, unless we have at least K ~ 0(fi~ l ) samples. 
Therefore, it is desirable to devise an estimate of fj, n which, compared to this simple 
Monte Carlo estimate, attains the same accuracy with fewer number of samples or lower 
computational cost. 

Importance sampling is a Monte Carlo algorithm based on a change of measure. Suppose 
that F n ^ p is absolutely continuous with respect to another measure Q on Vl n with 



n,p 



Y- 1 : a 



Then we have 

Hn = E[M K ] = E 



if^pr*) =e q Lj^ 1Wnt[ x k )Y-\x 



k=i 



k=i 



(2.2) 



where Eq denotes expectation with respect to Q, and we now use {X k }'j? =1 to denote a 
set of random graphs sampled independently from the new measure Q. If we define 



M K = ±Y, 1 WnAX k )Y-\X k ), 



(2.3) 



k=i 



then Mk is also an unbiased estimate of fi n , and the relative error is now: 



^Var Q (M y ) 
Eq(M x ) 



,[(l Wntt (X)Y- 



-1\2] 



(^n) 



UnVK 



(2.4) 
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Formally this is optimized by the choice Y = ([i n )~ 1 lw nt (X), in which case the relative 
error is zero. Such a choice for Q is not feasible, however, since normalizing Y would 
require a priori knowledge of \x n = f n ,p(W n) t)- Intuitively, we should choose the tilted 
measure Q so that X^ £ W n % occurs with high probability under Q. 

We will refer to Y^ 1 as the importance sampling weights, and Q as the tilted measure, 
or tilt. If Q arises naturally as the measure induced by a random graph Q n , we will also 
refer to Q n as the tilt. 



Asymptotic optimality and large deviations. In view of (2.3), one way of comparing 
the efficiency of importance sampling estimators is to look at which estimator has the 
smaller second moment. When the family of measures P njP possesses a large deviations 
principle, a notion of asymptotic optimality (or asymptotic efficiency) of the estimator 
Mk can be defined with the interpretation that the second moment of the estimator is the 
smallest possible in the asymptotic sense, as afforded by the large deviation principle ([4] 
and see also Definition 2.3). Thus, before defining asymptotic optimality, we first proceed 
with a description of the large deviations principle for Erdos-Renyi random graphs. 

In the context of Erdos-Renyi random graphs, Chatterjee and Varadhan [8] have proved 
a general large deviation principle which is based on the theory of dense graph limits 
developed by [3]. In this framework, a random graph is represented as a function X(x, y) £ 
W, where W is the set of all measureable functions / : [0, l] 2 — > [0, 1] satisfying f(x, y) = 
f(y,x). Specifically, a finite simple graph X on n vertices is represented by the function, 
or graphon, 

n 

X(x,y) = V Xijl^ llxr4=1 iJx,y) £ W. (2.5) 

Here we treat (Xij) as a symmetric matrix with entries in {0, 1} and Xa = for all i. 
In general, for a function / £ W, f(x,y) can be interpreted as the probability of having 
an edge between vertices x and y. Then, we define the quotient space W under the 
equivalence relation defined by / ~ g if f(x,y) = g(ax,ay) for some measure preserving 
bijection a : [0,1] — > [0,1]. Intuitively, an equivalence class contains graphons that are 
equal after a relabelling of vertices. (See, e.g., [3,8] for further exploration and properties 
of the quotient space.) 

By identifying a finite graph X with its graphon representation, we can consider the 
probability measure f n , P as a measure induced on VV supported on the subset of graphons 
of finite graphs. For / £ W, denote 

£(f) = f ! f(x,y)dxdt (2.6) 
Jo Jo 



and 



T(f) = f ! ^ f(x, y)f(y, z)f(x, z) dx dy dz. (2.7) 
Jo Jo Jo 



'0 Jo Jo 



We see that E(X) = \8{X) and T(X) = \T(X), so that 8 and T represent edge and 
triangle densities of the graph X, respectively. Then, rather than considering the event 
W n> t, we shall equivalently consider the upper tails of triangle densities, 

m ■■= {/ e w I nf) > t 3 }. 
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The large deviation principle of Chatterjee and Varadhan [8] implies for any p £ (0, 1) 
and t G [p, 1] , 

lim \ log P (T(g n , P ) > t 3 ) = -<t>(p, t) (2.8) 
where 0(p, t) is the large deviation decay rate given by a variational form, 

<t>(p,t)=M{X p (f)\f€W, T(f)>t 3 }= inf [X p (/)]. (2.9) 

/6VV( 



Here, 



^(/) := r f 

Jo Jo 



L p (f(x,y))dxdy (2.10) 



is the large deviation rate function, where L p : [0,1] — >• R is defined at (1.3). A further 
important consequence of the large deviation principle concerns the typical behaviour of 
the conditioned probability measure 

When we refer to Q njP conditioned on the event {l~(f) *z t 3 }, we mean the random graph 
whose law is given by this conditioned probability measure. 

Lemma 2.1. ([8, Theorem 3.1], Lemma A.l) Let J-* C W be the set of graphs that 
optimize the variational form in (2.9). Then the Erdos-Renyi graph Q np conditioned on 
\T(f) ^ ^ asymptotically indistinguishable from the minimal set F* . 

The term "asymptotically indistinguishable" in Lemma 2.1 roughly means that the 
graphon representation of the graph converges in probability, under the cut distance met- 
ric, to the constant function u* at an exponential rate as n — > oo. Intuitively, this means 
that the typical conditioned Erdos-Renyi graph resembles some graph /* £ J 7 * for large 
n. In order to give a more precise definition of asymptotic indistinguishability, we first 
recall the cut distance metric <5n , defined for /, g G W by 



fa(/,sO = inf sup 

a 5,TC[0,1] 



(f(o-x, ay) - g(x, y)) dx dy 
SxT 



where the innmum is taken over all measure-preserving bijections a : [0, 1] — > [0, 1]. For 
./":•./"> : >V. 

fa(J r i,J 7 2 )= inf <fa(/i,/ 2 ). 

It is known by [14] that (W, 5u) is a compact metric space. 

We say that a random graph Q n on n vertices is asymptotically indistinguishable from 
a subset T C W if for any t\ > there is e 2 > such that 

limsup^ logP(<fo(£„,.T) > ei) < -e 2 . (2.11) 

n— too Tl 

Further, we say that Q n is asymptotically indistinguishable from the minimal set T C W 
if T is the smallest closed subset of W that Q n is asymptotically indistinguishable from. 
Clearly, if Q n is asymptotically indistinguishable from a singleton set J- ', then T is, trivially, 
minimal. Finally, we say two random graphs Q\ are asymptotically indistinguishable 
if they are each asymptotically indistinguishable from the same minimal set T C W. 
Intuitively, this means that the random behaviour, or the typical graphs, of Q\ resembles 
that of Q\ for large n. (See [5] and [8] for a wide-ranging exploration of this metric in the 
context of describing limits of dense random graph sequences.) 
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Using this terminology, we observe that an Erdos-Renyi graph G n ,u is asymptotically 
indistinguishable from the singleton set containing the constant function f* = u. A special 
notion about whether the conditioned Erdos-Renyi graph is again an Erdos-Renyi graph 
leads to the following definition. 

Definition 2.2. The replica symmetric phase is the regime of parameters (p, t) for which 
the large deviations rate satisfies 

inf [l p (f)] = I p (t), (2.12) 

J&VVt 

and the infimum is uniquely attained at the constant function t. 

The replica breaking phase is the regime of parameters (p, t) that are not in the replica 
symmetric phase. ■ 

Hence, the notion of replica symmetry is a property of the rare event problem, where, 
conditioned on the event {T(f) ^ i 3 }, the Erdos-Renyi graph Q n ^ behaves like another 
Erdos-Renyi graph Q n $ with the higher edge density t, for large n. In constrast, the 
conditioned graph in the replica breaking phase is not any Erdos-Renyi graph, and has 
been conjectured to exhibit a clique-like structure with lesser than t edge density. The 
term "replica symmetric phase" is borrowed from [8], which in turn was inspired by the 
statistical physics literature. However, we remark that this term has been used by different 
authors to refer to other instances of graphs behaving like an Erdos-Renyi graph. 

The large deviations principle gives us an estimate of the relative error in the estimate 
Mk- For any fixed K, it is clear from (2.4) that minimizing the relative error is equivalent 
to minimizing the second moment Eq^I^Y -1 ) 2 ]. By Jensen's inequality, we have the 
following asymptotic lower bound: 

lim inf ^logE Q J(l Wt y- 1 ) 2 ] > -2 inf [l p (f)] = -2<f>(p,t). (2.13) 

This leads to the definition of asymptotic optimality. 

Definition 2.3. A family of tilted measures Q n on W is said to be asymptotically optimal 
if 

lim ^logE Qn [(l Wt F- 1 ) 2 ] = -2 inf [l p (f)]. 

In contrast, the second moment of each term in the simple Monte Carlo method satisfies 

lim -lloglpjl^j = - inf X p {f) = -<f>{p,t) > -2</>(p,t). 

Thus, the simple Monte Carlo method is not asymptotically optimal. Observe that 
Jensen's inequality for conditional expectation implies 

'EpJlw^V 1 



< P n (W t )- 2 E P „(l Wt y- 1 ) =¥ n (W t )- 2 EQ n (l m Y- 2 ). (2.14) 
So, if <Q n is asymptotically optimal, we must have 

lim inf ^logQ n (W t ) ^ lim inf -= logP n (W t ) + lim inf ^ logEQ n (l Wt y- 2 ) = 0, (2.15) 

n— >oo n n— >oo n n— >oo n 



which is consistent with the intuition that a good choice of Q n should put G Wj with 
high probability. 
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2.1. Asymptotic behavior of exponential random graphs. To find "good" impor- 
tance sampling tilted measures, we focus on the class of exponential random graphs. The 
exponential random graph is a random graph on n vertices defined by the Gibbs measure 

Q(X) = Q^' a (X) oc e n2n{x) (2.16) 
on Q n , where for given h G K, j3 G M+, a > 0, the Hamiltonian is 

H(X) = ±£{X) + ^T(X) a . (2.17) 

We will use ip n = ipn ,l3,a to denote the log of the normalizing constant (free energy) 

V>n = V^° = ^log £ 

so that Qn ,/3 ' a = exp(n 2 ('H(X) — tf} n )). We denote by Gn ,l3 ' a the exponential random 
graph defined by the Gibbs measure (2.16). The case where a = 1 is the "classical" 
exponential random graph model that has an enormous literature in the social sciences, 
see e.g. [18, 19] and the references therin and rigorously studied in a number of recent 
papers, see e.g. [1,5,16,17,21,22]. In this case, the Hamiltonian can be rewritten as 
n 2, H(X) = hE(X) + &T(X). We will drop the superscripts in tp^ , when a = 1. The 
generalization to the exponential random graph with the parameter a was first proposed 
in [16]. 

Observe that the Erdos-Renyi random graph is a special case of the exponential random 
graph: if /3 = and h = h p with h p defined by (2.1), then Q^ p '°' a = F n>p for any a > 
and the edges are independent with probability p. On the other hand, choosing (3 > 
introduces a non-trivial dependence between the edges. By adjusting the parameters 
(h, f3, a), the Gibbs measure Q„ /3,a can be adjusted to favor edges and triangles to varying 
degree. 

The asymptotic behavior of the exponential random graph measures Qn'^' a and the free 
energy il>n^' a is partially characterized by the following result of Chatterjee and Diaconis 
[5] and Lubetzky and Zhao [16]. In what follows, we will make use of the functions 

1 1 

I(u) = -«Iog« + -(l-«)Iog(l-ti) (2.18) 
on u G [0, 1] and, for / G W, 

Z(/):= f 1 f 1 1(f(x,y))dxdy. (2.19) 
Jo Jo 

Theorem 2.4. (a) [5, Theorems 4.1, 4.2] For the classical exponential random graph with 
a = 1, the free energy satisfies 



lim i/j^' 13 = sup 



h Pi ti ^ 

-u + ^-Iiu) 



(2.20) 



If the supremum in (2.20) is attained at a unique point u* G [0,1], then the exponential 
random graph Qn is asymptotically indistinguishable from the Erdos-Renyi graph Q n , u * . 

(b) [16, Theorems 1.3, 4.3] For the exponential random graph with parameter a G 
[2/3,1], the free energy satisfies 



lim = sup 



(2.21) 
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If the supremum in (2.21) is attained at a unique point u* G [0,1], then the exponential 
random graph On 13 ' 01 is asymptotically indistinguishable from the Erdos-Renyi graph G n ,u* ■ 

Our main result in this section, stated next, is the generalization of the variational 
formulation for the free energy of the Gibbs measure of any exponential random graph. 
The consequence of this result leads to the connection between the exponential random 
graph and the conditioned Erdos-Renyi graph. Before stating the result we will need some 
extra notation. Extend the Hamiltonian defined in (2.17) to the space of graphons in the 
natural way 

H(f) := h{f) + ^T(f) (2.22) 

where recall the definitions for the density of edges and triangles for graphons defined 
respectively in (2.6) and (2.7). For fixed q G (0,1) recall the functions T q (f) from (2.10) 
and the function 1(f) from (2.19). 

Theorem 2.5. Given any Gibbs measure parameters (h,f3,a) 6lx M + x (0, 1], assume 
wlog that h = h q = log j^- for some q G (0,1). For u G [0,1], denote <9W„ := {/ G 
W | T(f) = u 3 } and let F* C W be the set of minimizers of ini f e gw u [I q (f)]. Then for 
the exponential random graph Gn q,l3,a , the free energy satisfies 

lim ^' a = sup [?{(/) -J(/)] 



few 



sup 

(Kwsgl 



(2.23) 



where 



ct> q (u)= inf [l q (f)]. (2.24) 

feaWu 

The supremum, supj g w[^(/) —1(f)], is attained exactly on the set F**, where v* maxi- 
mizes the RHS of (2.23). 

Further, if (q,v*) belongs to the replica symmetric phase, then the supremum, 
supy g yy[%(/) —1(f)], is attained uniquely by the constant function f v * = v* , and 

lim Vn 9,/3 ' Q = V(v*) = sup [V(u)] (2.25) 

where 

V(u) = V(u; h q , 0, a) = ^-u + ^u 3a - I(u). 

I 6 

Proof. The first equality in (2.23) follows from Thm 3.1 in [5]. To show the second equality, 
suppose / G d\V u , for u G (0, 1). 

n(f)-i(f) = ^£(f) + ^u 3a -i(f) 

2 6 

= ^-X q (f)- l -\og(l-q) (2.26) 
P -,3a r^r^M 1 



This implies that 



sup [H(f) - 1(f)] ^ ^u 6a - inf [Z q (f)] - - log(l 
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and the supremum supf e g Wu \H{f) —1(f)] is attained on the same set of functions J 7 * C W 
that optimize inf f£dW u [Zq(f)]- Then, 



sup[W)-J(/)] 
few 



sup sup [H(f)-l(f)\ 

Osjn^l f&dWu 



^ sup 



-U 



For each u G (0, 1), let f u £ T*. Let u* maximize the RHS of (2.23). Then 



arg sup 

0<u<l 



arg sup 

0<u<l 



,3a 



inf [!,(/)] --log(l 



6 



feaw. 



It follows that 



S up[H(f)-l(f)] = ^(v*) 3a -l q (f v * 
/ew o 



- log(l - q) 



and moreover, the supremum supj £W [H(f) —1(f)] is attained by any f v * G J 7 **. This 
concludes the proof of (2.23). 

Now suppose (q,v*) belongs to the replica symmetric phase. This implies that the 
constant function v* is the unique minimizer of the LDP rate function inf /£W„. \^ q (f)]i 
and by Theorem 4.2(iii) in [8], 



l q (v* 



inf [l q (f)] 



inf [2o(/)]. 



Since I g (u) = /(it) 



i log(l — «), we have that the free energy is 



lim tpn q ' ,a 

n— >oo 



O 



inf EZ ff (/)]--log(l-?) 

/edW„. Z 



^ v *f--I q ( v *)-llog(l 



2 6 V 7 



/(V 



F(i;* 



Moreover, we claim that V(v*) = supo^ u <i[V(u)]. To see this, notice that if the optimizer 
of supj gVV [^(/) —1(f)] is a constant function, then it suffices to consider the supremum 
only over constant functions. But the only constant function in d\V u is the function u. So 

sup [?£(/) -1(f)] = sup[7£(/) - J(/)] = sup [ft(it) - /(«)] = sup 

/ew /ec osjusci 0<<u<i 

where CcWis the set of constant functions. The proof is complete. ■ 

Remark 2.6. Recalling the LDP rate (fr(q,u) defined in (2.9), [8, Theorem 4.2 (iii)] showed 
that for u ^ q 

4>(q,u) 



inf [l q (f)]=Uu\ 
feaWu 



inf \l q (f)] 

and the set of minimizers that attain the rate (j)(q,u) is exactly J 7 *. So, for any u 
q, the Erdos-Renyi graph Q nA conditioned on the event {T(f) ^ ii 3 } is asymptotically 
indistinguishable from the minimal set J 7 *, by Lemma 2.1. On the other hand, since the 
set T*^ attains the supremum, sup f eW [H(f) —1(f)], in Theorem 2.5, it follows from [5, 
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Theorem 3.2] and Lemma A.l that the exponential random graph Qn 9,l3 ' a is asymptotically 
indistinguishable from the minimal set J 7 *^. We have the following corollary. 

Corollary 2.7. Let the parameters (h q ,/3, a), (q, v*) be as in Theorem 2.5 and Eqn (2.23). 
Suppose v* ^ q. Then Gn q ' ' a is asymptotically indistinguishable from the conditioned 
Erdos-Renyi graph, G n ,q conditioned on the event {T(f) ^ (i>*) 3 }. 

In particular, if (q,v*) belongs to the replica symmetric phase, then Q l ^ q ^' a is asymp- 
totically indistinguishable from the Erdos-Renyi graph Gn,v* ■ 

The mean behaviour of the triangle density of an exponential random graph Qn q * ' can 
be deduced from the variational formulation in (2.23), and in special instances, so can the 
mean behaviour of the edge density. This is shown in the next proposition. 

Proposition 2.8. Given (h q ,(3,a) as in Theorem 2.5, if the supremum in (2.23) is at- 
tained at a unique point v* S [0, 1], then 

lim E\T(Gn q '^ a ) - (v*f \ = 0. (2.27) 

n— »oo 

Further, if (q,v*) belongs to the replica symmetric phase, then 

lim R\E(g%" p,a ) - v*\ = 0. (2.28) 

Proof This follows from [5, Theorem 4.2] and the Lipschitz continuity of the mappings 
/ h-> T(f) and / i y £{f) under the cut distance metric 5a [3, Theorem 3.7]. The proof is 
left to the appendix. ■ 

2.2. Triangle and edge tilts. In this section, we use the variational form of the free 
energy, (2.23), to construct the triangle tilts for the importance sampling scheme (see 
Definition 2.12). In order to define the triangle tilts, and in view of Theorem 2.5, we must 
characterize the (p, t) regime where there exists a Gibbs measure satisfying 



t = arg sup 

Osgu^l 



L 3a - inl M^(/)]-ibg(l- P ) 

O f&OTu I 



arg sup 



^u 3a - ct> p {u) 



(2.29) 



where 4> p (u) = inf /eaw u Pt>(/)]- Since T p {f) is the rate function, it is known that 4> p (p) = 
and <j) p {u) is continuous and strictly increasing on [p, 1] (Theorem 4.3 in [8]). 

If (f> p (u) is differentiable everywhere, then the extremal points u* of the function V(u) 
/3 u 3a _ (j) p [u) satisfies 

V'{u*) = ^(u*) 3 "- 1 -</>>*) = (), 
Then for (2.29) to hold, (3 must necessarily be given by 



26' (t) 



The next lemma shows that, regardless of the differentiability of <j> p (t), provided a certain 
minorant condition holds, we can find a (5 and a sufficiently small a so that (2.29) holds, 
and consequently that the exponential graph is asymptotically indistinguishable from the 
conditioned Erdos-Renyi graph. 

We shall say that (p,t) satisfies the minorant condition with parameter a if (t 3a , 6 p (t)) 
lies on the convex minorant of the function x \— > 6 p (x l ^ a ). If (t 3a , <f> P (t)) lies on the convex 
minorant of x i— > (/>p(x 1//3a ), then subdifferential(s) of the convex minorant of x 6 p (x 1 ^ 3a ) 
always exist and are positive. Recall that the subdifferentials of a convex function f(x) 
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at a point x are the slopes of any line lying below /(x) that is tangent to / at x. The set 
of subdifferentials of a convex function is non-empty; if the function is differentiable at x, 
then the set of subdifferentials contains exactly one point, the derivative f'(x). 

Lemma 2.9. Suppose (p,t) satisfies the minorant condition for a > sufficiently small. 
Let ^ be any subdifferential of the convex minorant of x t-t (f) p {x 1 ^ a ) at the point i 3 °. 

Then supQ^^f^it 3 " — (j)p(u)] is maximized at t. Moreover, if 4> p (u) is differentiable at t, 
then f3 = (3* , as defined in (2.30). 

Proof. The proof follows a similar technique to [16]. Using the rescaling u \— > x 1 / 3 ", the 
variational form supg^,^ [§ii 3Q — 4>p{ u )\ can be rewritten as 



sup L 

0<a:<l 6 

Let <ft p (x) denote the convex minorant of x \— > 4> p {x l /' ia ). The assumption that ^ is a 
subdifferential of (j) P {x) at x = t 3a implies that the maximum of sup^^x — 4> p (x)] is 
attained at t 3a . Since, for sufficiently small a, the point (t 3Q , (f> p (t)) lies on <p p (x), we 
have that $ p (t^ a ) = 4> p (t) and so the maximum of sup x [^x — (^(x 1 / 3 ")] is also attained 
at i 3a . It follows that the maximum of sup u [^u Sa — 4> p (u)] is attained at t. (However, 
this maximum may not be unique. If the subtangent line defined by the subdifferential ^ 
touches 4> p at another point r 3a , then r also a maximum.) 

To prove the last part of the lemma, if <t> p {u) is differentiable at t, then the subdifferential 
is simply the derivative. Then we have 



— 

dx 



^x-M^ l/3a )] = ^-m t 



l-3a 



implies that = 



Next, we use the minorant condition and Lemma 2.9 to define a parameterized family 
of subregimes of the (p, i)-phase space. 

Definition 2.10. Fix a > 0. We define the regime S a to be the set of pairs (p, t) for 
which the minorant condition holds with a and there exists a subdifferential § of the 

o 

convex minorant of x \— > ^(x 1 / 3 ") such that the variational form sup [— u 3a — 4> P (u)] is 
uniquely maximized at t. ■ 

If a € [2/3,1], the exponential random graph is known to be asymptotically indistin- 
guishable from an Erdds-Renyi graph G n>u for some u G [0, 1]. Recalling Definition 2.2 of 
the replica symmetric phase, the following statement follows directly from the arguments 
in [16] and Theorem 4.3 in [8]. 

Lemma 2.11. 5 2 /3 *s exactly the replica symmetric phase. 

Lemma A. 2 shows that S a 3 S a > for < a < a'. The sets S a for a = 2/3, 1 are shown 
in Figure 2.1. Notice also from Figure 2.1 that there exists SOII16 Si Critical VctlllG Pcrit 

such 

that when p ^ p C rit, {P-,t) is replica symmetric for all t S \p, 1]; whereas when p ^ p C rit, 
there exists an interval [r p ,r p ] C (p, 1) where (p, t) is replica breaking if t G [l p i^p\i an d 
(p, t) is replica symmetric for all other values of t. 
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0.2 0.4 0.6 0.8 1 

P 



Figure 2.1. Si is the dark gray region to the right of the solid curve 
(not including the solid curve). S2/3, the replica symmetric phase, is the 
light gray region to the right of the dotted curve (not including the dotted 
curve), together with the dark gray region. The line is t = p. 

By definition, any replica symmetric (p, t) satisfies the minorant condition for any a G 
[2/3, 1]. Are there any replica breaking (p, t) that satisfies the minorant condition for some 
a? The answer is in the affirmative. To see this, consider a = 1/3 and convex minorant of 
x 1 y (f) p (x 1 / 3a ) = <p p (x). For each p < p cr it, there exists an interval [r ,r p ] C (p, 1) where 
(p,t) is replica breaking if t G and (p,t) is replica symmetric for the other values 

of t. Since 4> p {t) < I p (t) if t G fcp>^p] an d <f>p{t) = Ip(t) f° r other values of t, and since 
I p (u) is convex, the convex minorant of 4> p {x) must touch (j) p at at least one t p G [zL p ,r p ]. 
So (p, t p ) is replica breaking and satisfies the minorant condition. 

The preceding argument shows that Ua>o is strictly larger than the replica sym- 
metric phase, and contains a nontrivial subset of the replica breaking phase. Using the 
characterizations of the sets S a , we are now ready to define the triangle tilts. 

Definition 2.12. Given (p,t) G S a for some a > 0, a triangle tilt with parameter a 
corresponding to (p,t) refers to any Gibbs measure Qn P, ^' a where h p = logj^, and ^ is 

any sub differential of the convex minorant of x 1— > ^(x 1 / 3 "). If <fi p (u) is differentiable at 
t, then there is exactly one triangle tilt with parameter a corresponding to (p, t), with the 
parameters (h p ,f3*,a) where /?* is defined in (2.30). ■ 

The triangle tilt with parameter a corresponding to (p, t) is well-defined only when 
(p,t) G S a , or, equivalently stated, it is well-defined only when 2.29 holds. In view of 
Theorem 2.5 and Lemma 2.9, when the triangle tilt with parameter a corresponding to 
(p, t) is well-defined, it is the measure induced by an exponential random graph which 
satisfies (2.27) and which is asymptotically indistinguishable from the conditioned Erdos- 
Renyi graph Q n ^ p conditioned on the rare event {T(f) ^t 3 }. Also, if (p, i) G S a , then since 
by Lemma A. 2 the sets S a i are increasing as a' decreases, the triangle tilt with parameter 
a! corresponding to (p, t) is defined for any a' ^ a. If (p, t) is in the replica symmetric 
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phase, the triangle tilt can be defined for some a G [2/3, 1], and since <j) p (t) = I p (t) in the 
replica symmeteric phase, from (2.30) the triangle tilt parameters necessarily take on the 
following explicit expression: (h p , {3* ,a), where 

/r = (2.3i) 

If (p, t) is in the replica breaking phase, we may need to resort to numerical strategies to 
find the parameters f3 and a. 

Remark 2.13. Given any (p,t), if 4> p (u) is differentiable at t, then we can define /3* in 
(2.30) regardless of whether (p, t) belongs to S a . In this case, t is a stationary point of 
the function L(u) : u i— > ^u 3a — (/> p (u). If <fi p (u) is twice differentiable at t, then since 

we have that t is a local maximum of L(u) if and only if dt/3* > 0. 

Now note that in the replica symmetric phase where the LDP implies that an Erdos- 
Renyi random graph Q n>p conditioned on the event \ T(f) ^ i 3 } is indistinguishable from 
Qn,ti we have the obvious edge tilt as follows. 

Definition 2.14. Given (p,t), let ht = log j^-z. The edge tilt refers to the Gibbs measure 
Qh t ,o,a _ Qh t ,o _ p corresponding to the Erdos-Renyi graph Q nt t- 

It is also possible to consider tilts that are a hybrid between the edge tilt and triangle tilts 
and that satisfy (2.27). Such tilts can be constructed explicitly for the replica symmetric 
phase. Consider the extremal points of the function 

V(u) = V(u;h,(3,a) = K + ^u 3a - I(u). (2.32) 

2 o 

If the maximum of V(u) occurs at u*, we must have 

V'(u*) = ^ + ^(u*) 30 - 1 - \ log ^ = 0, (2.33) 

and 

Using (2.33) we may express ft as a function of f3 and a: 

h(p, a) = log - Pa{u*f a -\ (2.35) 

The next lemma follows from the continuity of V and the conditions (2.33), (2.34). 

Lemma 2.15. Let u* £ (0,1) and fix a £ [2/3,1]. For (3^0, let h(f3,a) be de- 
fined by (2.35). There exists /?o > 0, depending on a, such that for all (3 G [0, /?o), 
V(u; h(/3, a), f3, a) attains a global maximum uniquely at the point u = u* . In particular, 
the family of exponential random graphs g^' a )'^' a with (3 G [0, (3q) are asymptotically 
indistinguishable from the Erdos-Renyi graph Q n ,u* with edge probability u* . 

When (p, t) belongs to the replica symmetric phase, we can apply Lemma 2.15 with u* = 
t to obtain a family of hybrid tilts with the parameters (h((3, a), (3, a) for f3 G [0, (3q). Due 
to Theorem 2.4(b), the hybrid tilt satisfies (2.27) and is asymptotically indistinguishable 
from the Erdos-Renyi graph G n j- Hybrid tilts of this form are considered in the numerical 
simulations in Section 4.2. 
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3. Asymptotic Optimality in the replica symmetric phase 

The reason for the names, triangle tilt or edge tilt, is that the Radon-Nikodym deriv- 
ative, Jjj, that weights the samples in the importance sampling estimator (2.3) depends 
only on the number of triangles or the number of edges, respectively, in the samples. That 
is, 

Here recall that T(X) = -%T(X) is the density of triangles in X and £(X) = ^jE(X) is 
the density of edges. 

In the case of the edge tilt, the fact that the weights depend only on the number of 
edges has deeper repercussions. Since E[f(^n t '°)] ~ t, good samples in the target event 
{T(f) ?S i 3 } having fewer than t density of edges are being over-penalized by the weights. 
In contrast, the triangle tilt penalizes samples more heavily only when they deviate from 
t 3 density of triangles. 

To formalize the above heuristic arguments, we study the asymptotic optimality, or 
non-optimality, of importance sampling schemes based on the tilted measures Qj'' 3 ' Q . For 
any admissible parameters (h,/3,a), importance sampling estimator based on the tilted 

Qh,B,a ■ 
n IS 

= ±Elm(X k )^(X k ) 

k=l m ^ n 

= ± E lw t (X k ) exp |n 2 (^^S(X k ) - ^T(X k ) a + ^ - } (3.1) 

where Xf. are i.i.d. samples drawn from Q n . Denote 

q n = UX) = lm(X)^- a {X). 

For any (h,/3,a), E[g n ] = fj, n and so Mk is an unbiased estimator for fj, n . 

We now prove the asymptotic optimality of the triangle tilts, Theorem 1.1. 
Proof of Theorem 1.1. 



Proof. Due to (2.13), it suffices to show that 

lim \ logE Q [g2] < -2 inf l p (f). (3.2) 

Note that £,T : W 4 I are bounded continuous mappings [3, Theorem 3.8], and the 
exponent of the indicator l Wt (X) = e ~ n2 ° w ^ x \ where Ow t (X) = if X G W t and 
Oy\; t (X) = oo otherwise, can be approximated by bounded continuous approximations. 
Since I p (f) is the rate function for the family of measures P niP , (Theorem 3.1 of [5]), we 
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may apply the Laplace principle: for any (h, (3, a) 



1 



1 



lim -jlogE Qn [g n ] = lim -^logE 



n— >oo n 



n— >oo n 

inf 

fem 

inf 

feWt 



n,p 



Mf) + 



^£(f) + § TUT 

^-^£{f) + f T(/)° 

I O 



+ lim 

n— >oo 



4: 



h,/3,a 



+ F(u*) + ~log(l-p) 



(3.3) 



where, by (2.23), 



V(«) := -u 
K ' 6 



inf &(/)]- -log(l-p), 
feaWu I 



and it* = argsupo< u <i[y(n)]. Then 



lim — =■ log 

n— >oo ji« 



(3.4) 



inf 

fem 

inf 

feWt 



Mf) + 
Uf) + 



+ ^( u *)3«_ mf [/(/)] 



h — h, 



+ g((t 



for any (h,[3,a). The last inequality follows from the fact that 7~(/) #s i 3 for all / E Wt. 
Now, taking the triangle tilt with (h p , j3, a), we have by its definition that u* = t. Then 

1 -logEq^ 



lim 

n->oo n z 



- inf [£,(/)]- 
-2 inf [£,(/)] 



inf [/»(/)] 



Combined with the upper bound for the asymptotic second moment, we conclude that the 



triangle tilt 



*h p ,p,a 



yields an asymptotically optimal importance sampling estimator. 



3.1. Non-optimality. In this section, we show the non-optimality of importance sam- 
pling estimator with certain tilted measures. In the first result, we show that an exponen- 
tial random graph that is not indistinguishable from the conditioned Erdos-Renyi graph 
cannot produce an optimal estimator. In the case where (p, t) belongs to the replica sym- 
metric phase, this rules out all exponential random graphs that are indistinguishable from 
Gn,u, with u t, from being asymptotically optimal, but does not rule out the Erdos- 
Renyi graph Q n j corresponding to the edge tilt. Then, the second non-optimality result 
identifies a non-trivial subset of the replica symmetric phase for which the edge tilt does 
not produce an optimal estimator. 



Proposition 3.1. Given (p,t). Let Q n 



be such that the variational form 



su Po«;m^i 



inf/eenv [£?(/)] 



attains its maximum at u* ^ t. Then the importance 



sampling scheme based on the Gibbs measure tilt 



is not asymptotically optimal. 



Proof. Let /* be any minimizer of the LDP rate function, inf /gw t \Z p (f)}. Theorem 2.5 
implies that /* does not maximize supj GVV [^(/) —1(f)}. From (3.3) and the first equality 
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of (2.23), 
lim ^logE Qn [ql 



n—^oo n 



inf 
inf 



Uf) + h ^£U) + ~T(/r 



> 



-2r(/*) = -2 / mfjX ? ,(/)] 



+ lim ^' Q + -log(l-p) 

n— >oo 2 

+ su P [W)-z(/)] + iiog(i-p) 

few 2 

+ ^(D + ^nrr - nn + \ mi - p) 



The importance sampling estimator is not asymptotically optimal. 



— 1/2 

Proposition 3.2. Let < p < ^—=fj2 an d t € (p, 1)- ijf t is sufficiently close to 1 and 
(p, t) belong to the replica symmetric phase, then the importance sampling scheme based 



on the edge tilt Qn U ° is not asymptotically optimal. 
Proof: Starting from (3.4), we have 



lim — 7T lo, 

n— >oo n 



(3.5) 



inf 

few t 



h{p) - h p 



£(/) 



i P (t) + ^ + 



P.3 , ( KP)-h v 



where h((3) = h(/3, 1) as in (2.35) with a = 1. Because (p,t) is in the replica symmetric 
phase, T p (f) is minimized by the constant function 

ft(x,y) = t = arg inf [X p (/)]. 

tewt 



On the other hand, £ is minimized by 



/i(x,y) = l m 2{x,y) = arg inf ,[£(/)]. 



(3.6) 



This /i represents a graph with a large clique, in which there is a complete subgraph on 
a fraction t of the vertices. Let us define 



T{t) = l p (f t ) + ^T(f t ) + 
= / p (t) + ^ 3 + 



M0) - ^ 



and 



r(i) = x p {h) + ^r{f. 



2 

i%(i) + (i - t 2 )/ p (o) + ^ + ( m ~ ^ ^ 



t, 



t z 



(Recall h(f3) = ht here.) From (3.5) we see that 

«2 



Inn -1 lo g E Q [f n ) > -r(l) - J p (t) + §t» + (M^LA) , 
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We claim that for p < e 1 ^ 2 /(1 + e 1//2 ) and t sufficiently close to 1, we have T(l) < T(t). 
Indeed, let g(t) = T(l) - T(t): 

g(t)=T(l)-T(t) = t%(l) + (l-i 2 )/ P (0)-/ P W + (^)(i 2 -i) 

= t 2 i p (i) + (i - t 2 )i p (o) - i log ~ - (i - io g (i - P )j 



2 -(l - ^ log(l - t) + - (tog t - log - t). 

Observe that g(l) = and 

g'(l) = 2J P (1) - 2J P (0) - 1/2 = - log (t^A ~ 1 / 2 ' 

So, if p < e~ 1 / 2 /(l + e^ 1 / 2 ), we have ff'(l) > 0. So, for t sufficiently close to 1, we have 
r(l) < r(t). Therefore, 

and we conclude that 

hm ^logE Q [g 2 ] > -r(l) - I„(t) + ft 3 + (^_^£) t 
> -2Ut) = -2 inf iJf). 

Since the strict inequality holds, the importance sampling scheme associated with 
cannot be asymptotically optimal. 



— 1/2 

Remark 3.3. The critical point in the proposition, p = 1 ^ e _ 1/2 ~ 0.3775, corresponds 
to hp = —1/2. In consideration of (2.11) and Figure 2.1, we see that the conditions of 
the proposition are attainable: if p < p and t is sufficiently close to 1, then (p,t) will 
be in the replica symmetric phase. For example, when p = 0.35, we can numerically 
approximate the value of t ~ 0.948, so that whenever t G (t, 1], the edge tilt for (0.35, t) is 
not asymptotically optimal. 

4. Numerical simulations using importance sampling 

We implement the importance sampling schemes to show the optimality properties of 
the Gibbs measure tilts in practice. Although we have thus far been considering impor- 
tance sampling schemes that draw i.i.d. samples from the tilted measure Q, in practice 
it is very difficult to sample independent copies of exponential random graphs. This is 
because of the dependencies of the edges in the exponential random graph, unlike the sit- 
uation with an Erdos-Renyi graph where the edges are independent. Thus, to implement 
the importance sampling scheme, we turn to a Markov chain Monte Carlo method known 
as the Glauber dynamics to generate samples from the exponential random graph. The 
Glauber dynamics refers to a Markov chain whose stationary distribution is the Gibbs 
measure Q^' Q . The samples from the Glauber dynamics are used to form the impor- 
tance sampling estimator Mk in (3.1). The variance of Mk clearly also depends on the 
correlation between the successive samples. However, in this paper, rather than focus on 
the effect of correlation on the variance of Mk, we instead investigate and compare the 
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optimality of the importance sampling schemes, and show that importance sampling is a 
viable method for moderate values of n. 

4.1. Glauber dynamics. For the exponential random graph Gn^ ,a , the Glauber dynam- 
ics proceeds as follows. 

Suppose we have a graph X = (Xy)i^i<^ n . The graph X is generated from X via the 
following procedure. 

1. Choose an edge Xy, for some from X uniformly at random. 

2. For the new graph X, fix all other edges Xj/j/ = Xiiji, for (i',f) ^ 

3. Conditioned on all other edges fixed, pick 



where 



and where 



Xij ~ Bern(</3) 

,h+{P/nXL ij +M ij ) a (n 3 /6) 1 -' 



Lij — ^2 XikXjk, and My — X^[X^ m Xi m , 

is the number of 2-stars in X with a base at the edge Xij, and the number of triangles 
in X not involving the edge Xij, respectively. 
4. If conditioning on Aj is used, check if X is in Aj. If not, revert to X. 

In step 4, a conditioning of the Gibbs measure is discussed in Section 4.3. 
For the classical exponential random graph with a = 1, the probability (p in the Glauber 
dynamics has a neater expression, 

e h+0Lij/n 
^ ]_ _L e h+/3Lij/n ' 

At each MCMC step, if X„ / Xy, then E(X) differs from £(X) by one edge, and T(X) 
differs from T(X) by nLy triangles. The stationary distribution of the Glauber dynamics 
is the Gibbs measure Qn'^' that defines the exponential random graph Qn ^' ■ Regarding 
the mixing time of the Markov chain, [1] showed that if (h, f3, 1) has the property that the 
unique global maximum of the function V(u), defined in (2.32), is also the unique turning 
point, then the mixing time for the Glauber dynamics is C(n 2 logn). For other values of 
{h, f3, 1), the mixing time is 0(e n ). 

4.2. Numerical simulations in the replica symmetry phase. The importance sam- 
pling scheme was performed for p = 0.35, t = 0.4, in the replica symmetry phase, using 
the Glauber dynamics. The simulations used the Gibbs measure tilts Qn W ' P , with a = 1 
and f3 is of the form 

P = h = ( lo S Y~t - lo S JZrj) > for g = 0.35, 0.36,..., 0.4 (4.1) 

and h(P q ) is given by (2.35). Each of these exponential random graphs Qn 8 is indis- 
tinguishable from the Erdos-Renyi graph G n j, and q = p = 0.35 is the triangle tilt while 
q = t = 0.4 is the edge tilt. For the values p ^ q ^ t, the Table 4.1 verifies the accuracy of 
the importance sampling estimates for \i n := P(£? n>p G Wt) using the tilts Q^ /3g ' ) '^''. Also 
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n\q 


0.35 


0.36 


0.37 


0.38 


0.39 


0.4 


Id 


n 1 o/i ik 
U.1Z4 ( 

(-0.008131) 


n 10/17 
U.1Z4 ( 

(-0.008132) 


n 1 okoi 
(-0.008116) 


n i oa ok 

U.1Z4ZD 

(-0.008146) 


n 1 OA A 1 
U.1Z441 

(-0.008141) 


H 1 OA Q K 
U.1Z4O0 

(-0.008143) 


32 


0.01107 
(-0.004398) 


0.011056 
(-0.004399) 


0.011116 
(-0.004394) 


0.010941 
(-0.004409) 


0.010972 
(-0.004407) 


0.010729 
(-0.004429) 


64 


2.1919e-06 
(-0.003181) 


2.0283e-06 
(-0.003200) 


2.6073e-06 
(-0.003139) 


5.3287e-07 
(-0.003527) 


1.3822e-06 
(-0.003294) 


3.5772e-06 
(-0.003062) 


96 


1.1036e-ll 
(-0.002738) 


1.6868e-ll 
(-0.002692) 


2.0805e-ll 
(-0.002669) 


4.4039e-ll 
(-0.002587) 


2.6124e-ll 
(-0.002644) 


4.497e-ll 
(-0.002585) 


h 

( 
\ 


^able 4.1. Comparison of the estimates for the probability [x n (top num- 
er) for varying tilts Qn 5 where the parameters /3 = /3 9i j are defined in 
4.1). Also shown is the log probability \og¥{Q n ^ p G Wt) (lower number). 


n\q 


0.35 


0.36 


0.37 


0.38 


0.39 


0.4 


ID 


0.030839 
(-0.01199) 


0.03007 
(-0.01206) 


0.030173 
(-0.01204) 


0.030553 
(-0.01203) 


0.031902 
(-0.01191) 


0.034105 
(-0.01173) 


32 


0.00039386 
(-0.007391) 


0.00038614 
(-0.007407) 


0.00040055 
(-0.007377) 


0.00042058 
(-0.007347) 


0.00047462 
(-0.007253) 


0.00052598 
(-0.00718) 


64 


2.9982e-ll 
(-0.005879) 


2.5716e-ll 
(-0.005917) 


4.6804e-ll 
(-0.005774) 


2.3144e-12 
(-0.006513) 


1.9783e-ll 
(-0.005995) 


1.8035e-10 
(-0.005461) 


96 


1.157e-21 
(-0.005220) 


2.7721e-21 
(-0.005125) 


4.9044e-21 
(-0.005065) 


2.8661e-20 
(-0.004876) 


1.4562e-20 
(-0.004951) 


6.8628e-20 
(-0.004785) 


i 


^ABLE 4.2. Comparison o: 
mmber) for varying tilts Ql 


* the estimates for the variance Var^{q n ) (top 
} where the parameters /3 = (3 qt t are defined 



in (4.1). Also shown is the log second moment ^ logEQ[g^] (lower number). 



shown is the estimate for the log probability, ^ log P(£/ njP G Wf). Since (p, t) is replica 
symmetric, the LDP rate is 

lim i-logP(g n . p G W t ) = -I p (t) = -0.002694. 

n^oo n 

The value of the log probability is seen to approach the LDP rate as n is increased. 

Table 4.2 shows the estimated values of the variance of the estimator, VarQ n (q n ), where 
q n = lyy t ^hfp , as well as the log second moment \ log EQ n [q^] ■ The variance of the 

estimator for all the tilts appear to be comparable and the log second moment likewise 
appears to converge towards —2I p (t) = —0.0053869. In this case, p = 0.35, t = 0.4 does 
not belong to the regime described in Proposition 3.2, and the numerical results suggest 
that all the tilts of this form, including the edge tilt, appear to be close to asymptotically 
optimal. 

For n = 16,32,64, the number of MCMC samples used was 5 x 10 4 n 2 logn, while for 
n = 96, the number of MCMC samples used was 10 5 n 2 log 

Both the random graphs corresponding to the triangle or edge tilts are expected by 
(2.27), (2.28) to have ujt 3 triangles and (^t edges on average. However, there is a 
difference between the way that the triangle and edge tilts produce events in |T(/) ^ £ 3 }, 
and that is in the number of edges in the successful samples that fall in the rare event. 
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Figure 4.1. Histogram of edge counts in the samples obtained by the 
triangle and edge tilts, conditioned on the rare event. The solid red line 
shows the triangle tilt; the dashed blue line shows the edge tilt; the dotted 
green line shows the Monte Carlo sampling. 

The distribution of the edge count of successful samples from the triangle tilt has a larger 
proportion with less than □{ edges, as compared to the edge tilt. This is shown in Figure 
4.1. 

4.3. Importance sampling with conditioned Gibbs measures. Quite a different 
issue from the asymptotic optimality of the importance sampling estimator is the question 
of the efficiency of the Glauber dynamics in drawing samples from the tilted measure. 
The efficiency of using an MCMC to draw samples is subject to the mixing time of the 
Markov chain. In the case of the exponential random graph, the mixing time of some such 
graphs is known to be exponentially long, C(e n ), due to the fact that the Hamiltonian 
'H(f) has multiple local maxima [1]. In this section, we propose a way to sidestep this 
issue, by using a conditioned version of the Gibbs measure, in which the sampling from 
the exponential random graph is restricted to an appropriate subregion of the state space 
Q. n . Conditioning the Gibbs measure on the desired subregion of the state space serves to 
focus the sampling to the region of the state space that really matters, and possibly also 
improving the mixing time of the Markov chain. 

The conditioned Gibbs measure is particularly apt in the following scenario. Suppose, 
for given (p, t), the variation form in (2.29) is locally, but not globally, maximized by t (c.f. 
Figure 4.2). If u* ^ t is the global maximum of (2.29), then Q„ ' a is indistinguishable 
from the conditioned Erdos-Renyi graph Q n ^ v conditioned on {T(f) (u*) 3 }. Compared 
to our target of exceeding (g)i 3 triangles, the samples from Qn^' a will have an over- or 
under-abundance of triangles, leading to a poor estimator with very large variance. Recall 
that Proposition 3.1 shows that the importance sampling estimator based on Qn^' a is 
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non-optimal. The conditioned Gibbs measure mitigates this problem by restricting the 
exponential random graph to having just the "right" number of triangles. 

Conditioned Gibbs measure. Given a set A C W, the exponential random graph condi- 
tioned on A, denoted G^a" nas ^ ne conditional Gibbs measure 



3 if X £ A 

where the Hamiltonian %{X) is defined in (2.16). The free energy ip ni A = ^ s 



n 2 H(X) 



xeA 

The following proposition describes the asymptotic behaviour of the free energy, which 
is analogue of Theorems 3.1 and 3.2 in [5]. 

Proposition 4.1. For any bounded continuous mapping M : W 4 1, and any closed 
subset A C W, let ip n:A = ^ log Y^xeA e" 2w(x) . Then 

lim Vy A = sup [H(f)-l(f)]. (4.2) 

n ^°° /ewo4 

Moreover, if T C W is the subset on which \H(f) —1(f)] is maximized, then for any e, 
there exists a constant 5 > such that 

limsup-^ ^ogF(5o(Gn,A,^) > e) ^ 5. 

Proof. This follows from a simple modification of the proof of Theorem 3.1 and 3.2 in [5] 
to restrict to the set A. ■ 

The importance sampling scheme based on the conditioned Gibbs measure Q^a" gi yes 
the estimator 

K 

= if E ^(^^^(Xk), where X k ~ i.i.d. (4.3) 

k=l "^n,A 

where = IPn,p|^4- Note that u A is an unbiased estimator for v n ^ v = Pn,p,A(Wt). The 

estimator (x n for fi n = F np (Wt) can be obtained from va by 

fin = VA ■ Wn,p(A) + F n , p (W t n A c ). 

Since the two probabilities on the RHS, particularly the second term, may not be easily 
computable or estimated, we may alternatively take £>a as a biased estimator for \x n . By an 
appropriate choice of the set A, we can ensure the bias is small and vanishes exponentially 
faster than the small probability we are trying to estimate (see Lemma A. 4(h)). 

In our application, the conditioning of the Gibbs measure is applied to control the 
number of triangles that the sampled graphs are allowed to have. Thus, it is natural to 
choose the set A of the form 

Aj = {/ 6 W : T(f) eJ}ni„ (4.4) 

where J C [0, 1] is a closed interval and Aq C W is a closed subset containing all the 
constant functions in J. Then the set Aj C W is closed because T is continuous in the 
cut distance metric 5\j. A consequence of Proposition 4.1 is a variational formulation 
similar to (2.23). 
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Proposition 4.2. Let Aj be defined in (4.4). Given any Gibbs measure parameters 
(h,/3,a), assume wlog that h = h q = log j-2- for some q G (0,1). For u £ [0,1], 
denote dW u := {/ € W | T(/) = it 3 } and Zei J 7 * C W 6e i/ie set o/ minimizers of 
i°£feA nBWu[Zq(f)]- Then 



lim V'n.Aj = sup [H(f) -1(f)] = sup 



^u 3a - inf [iJf)} - - 
6 feAondwJ qyJ n 2 



(4.5) 



The supremum supf eAj [H(f)—I(f)] is attained exactly on the setF** , where v* maximizes 
the RHS of (4.5). 

Further, if (q,v*) belongs to the replica symmetric phase, then the supremum 
supj ej4/ \H(f) —1(f)] is attained uniquely by the constant function /*« = v* , and 

lim ip nA = V(v*) = sup[V(u)]. 

n ->°° ue.j 

where V(u) = \u+ f u 3a - I(u). 

The proof is identical to the proof of Theorem 2.5 and is left to the appendix. Combining 
Propositions 4.1 and Theorem 4.2, the exponential random graph conditioned on Aj is 
asymptotically indistinguishable from the graphs in the set J-** . The case when J = [0, 1] 
and Aq = W, which is when there is no conditioning, coincides with Theorem 2.5. 

Using Proposition 4.2, the notion of the triangle tilt can be extended to the importance 
sampling schemes using the conditioned Gibbs measures, in a similar way as in Section 2.2 
for the full Gibbs measure, as follows. Given (p, t) and the set Aj, suppose there exists 
parameters (h p , f3, a) such that 



t = argsup 



^-u 3a - inf [X p (f)] 
6 fzdW u y pyJn 



(4.6) 



The conditioned Gibbs measure G^ff* is a (conditioned) triangle tilt with parameter a 
corresponding to (p, t). 

Under some mild conditions on the set Aj, Lemma A. 4 shows that 

lim \ logP n|Mj (W t ) = - inf [X p (f)] = - inf [l p (f)]. (4.7) 

Thanks to (4.7), the notion of asymptotic optimality for conditioned tilts is unchanged. 
As a corollary, we have that the conditioned triangle tilt also yields an asymptotically 
optimal importance sampling scheme. The proof is left to the appendix. 

Corollary 4.3. Given any (p,t), let Aj be defined in (4.4) with p G J and t G J° in 

the interior of J. Suppose that there exists a conditioned Gibbs measure G^ l p ^ ,a is a 

conditioned triangle tilt satisfying (4.6). Then the importance sampling estimator 

based on the conditioned triangle tilt is asymptotically optimal. 

We remark here that if the Glauber dynamics is use to generate samples from Q^'f'", 
we must require that Aj is connected. A sufficient condition for Aj to be connected is if 
J is an interval of the form [0, r] or [r, 1]. 
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Figure 4.2. The phase curve denotes the values of the stationary points 
of the variational form V(u) = \u + ^u 3a — I(u), as {3 varies, and given 
a = 1, hp = log j^—, p = 0.2. The red solid line denotes when the stationary 
point is a global maximum of V(u); the red dotted line denotes the local 
maximum; the blue dashed line denotes the local minimum. At the phase 
transition point at ~ 4.76, the maximum of the variational form jumps 
from u* w 0.253 to u* ~ 0.947. The inset shows the function V(u) for 
= (3* S3 5.99 attaining a local maximum at t = 0.3 and global maximum 
at u* m 0.989. 



Numerical illustration of a conditioned Gibbs measure. We illustrate the conditional Gibbs 
measure tilt with an example. For concreteness, let us set p = 0.2 and t = 0.3, and for 
the Gibbs measure parameters, set h = h p and a = 1, and vary f3 ^ 0. We will study 
how the asymptotic second moment changes as j3 varies. The pair (p, t) = (0.2, 0.3) is in 
the replica symmetric phase 1S2/3. For the triangle tilt with a = 1, we have from (2.31) 

f3 = f3* = (h t - h p )/t 2 . The variational form V(u;h p ,/3*) = + f^u 3 - I(u) has a 
local maximum at t = 0.3 but is maximized at a value u* ~ 0.989. (See Figure 4.2.) So 
(p, t) £ S\. The exponential graph Qn v '^ will produce on average (g)(u*) 3 triangles — this 
is too many triangles, and the variance of the importance sampling estimator will blow 
up. 

To avoid getting samples with too many triangles, let us restrict the state space to cap 
the number of triangles and edges, 

A r = {/ 6 W : T(f) < r 3 and 8{f) < r} (4.8) 
for some r > t. With h = h p fixed and for /3 > 0, the asymptotic second moment of the 
estimator under Q n = Q^A * s 



1 



femnAj 



Uf) + §T(/) 



+ F( u *) + Ilog(l-p) 



inf [X„(/)] + ^((^) 3 -t 3 )- inf [2p(/)] 



where u* = avgsup ^ u ^ r [V(u; h p , 0)]. 
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p* = 5.99 





-0.05 1 1 1 1 1 1 1 1 

1 2 3 4 5 6 

P 

Figure 4.3. A plot of the asymptotic second moment, 
lim n ^.oo ^ logE^^^], of the importance sampling estimator based 
on the conditioned Gibbs tilt for fixed h = h p and varying /3. The insert 
is a zoom- in to show that the smallest variance is attained at f3 = (3*. 
The dotted line shows the rapid deterioration of the asymptotic second 
moment of the estimator without the use of conditioning. Parameters 
used are p = 0.2 and t = 0.3. 



Figure 4.3 shows the asymptotic second moment for the tilts (h p ,f3) both with and 
without conditioning of the Gibbs measure. 

When the tilt with (3 = (3* is conditioned on A r , it gives the best estimator and is 
asymptotically optimal by Corollary 4.3 This is corroborated by the numerical simula- 
tions that suggest that the triangle tilt performs significantly better than crude Monte 
Carlo sampling, and also outperforms the edge tilt. In contrast, when no conditioning is 
performed, the IS estimator exhibits a sharp decline in performance when f3 is increased 
beyond the transition point at (3 ~ 4.76 (c.f. Figure 4.2). This transition point coincides 
with the phase transition when the exponential graph Qn P ^ exhibits a transition from 
a graph with low edge density to one with high edge density. As mentioned above, the 
graph with high edge density overproduces triangles, causing the estimator to have a large 
variance. 



Triangle tilt with parameter a and conditioned Gibbs measure. The importance sampling 
scheme was next performed for p = 0.2 and t = 0.3, in the replica symmetric phase. We 
now consider the following tilted measures, all of whose exponential random graphs are 
indistinguishable from the Erdos-Renyi graph Q n< t. 

h p ,/3* 3 ,2/3 

- Triangle tilt with a = 2/3: Q n , where f3^ 3 is defined in (2.31) with a = 2/3. 

- Conditioned triangle tilt with a = 1: Q^ 1 , where /3* is defined in (2.31) with a = 1, 
and A r is defined in (4.8) with r ~ 0.4272 > t, which is a local minimum of V{u). 

- Edge tilt: Q*"° = P n , t . 
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n 


Triangle tilt a = 2/3 


Conditioned triangle tilt 


Edge tilt 


Monte Carlo 


id 


U.UU04 
(-0.01b) / j 


U.UU04 1 4 

(-0.0197) 


U.UUozou 
(-0.0198) 




32 


4.3148e-7 
(-0.0143) 


3.5488e-7 
(-0.0145) 


3.3878e-7 
(-0.0145) 


3.7758e-7 
(-0.0144) 


48 


1.3976e-13 
(-0.0128) 


1.1418e-14 
(-0.0139) 


1.2039e-12 
(-0.0119) 




64 


6.1882e-21 
(-0.0136) 


2.9076e-23 
(-0.0127) 


1.8316e-19 
(-0.0105) 




r 


Fable 4.3. Estimates 1 


"or the probability /%. In parenthesis is the estima- 



tor for the log probability log [i. 



n 


Triangle tilt a = 2/3 


Conditioned triangle tilt 


Edge tilt 


Monte Carlo 


16 


1.5059e-4 


2.4166e-4 


1.0391e-3 






(-0.0334) 


(-0.0319) 


(-0.0267) 




32 


1.5222e-12 


1.9083e-12 


6.7116e-ll 


3.7758e-7 




(-0.0265) 


(-0.0263) 


(-0.0229) 


(-0.0144) 


48 


2.6058e-25 


3.268e-27 


2.4737e-20 






(-0.0245) 


(-0.0265) 


(-0.0196) 




64 


7.1703e-40 


2.8806e-44 


1.2806e-33 






(-0.0220) 


(-0.0245) 


(-0.0185) 






LABLE 4.4. Estimates 


"or the variance Varq n (q n 


). In parent 


resis is the 



estimate for the log second moment, -y logEQ n [<^]. 



Tables 4.3, 4.4 shows the estimated values for the mean and variance of q n = lw t — =7^3- 

d ^n,Aj 

The direct Monte Carlo simulation is shown for n = 32 to verify the estimates. We observe 
that both triangle tilts perform comparably, and both outperform the edge tilt. 



Appendix A. Auxiliary lemmas and proofs 

We collate a number of lemmas and proofs in this section, roughly in the order that 
they appear in the paper. 

Lemma A.l. (i) Given (p,t), let J 7 * be the set of functions that minimize the LDP rate 
function, inf /eWt[-^p(/)] * n (2-9). Then T* is the minimal set that the Erdos-Renyi 
graph Q n>p conditioned on \T(f) ^ i 3 } is asymptotically indistinguishable from, 
(ii) Given (h,f3,a), let J 7 * be the set of functions that maximize supj g yy [%(/) — %(f)]- 

Then J-* is the minimal set that the exponential random graph Qn^ ,a is asymptotically 
indistinguishable from. 

Proof. The proofs of asymptotic indistinguishability of T* was shown in [8, Theorem 3.1] 
for (i) and [5, Theorem 3.22] for (ii). The proofs naturally extend to give the minimality 
of J-*, and we state them here for the record. 

Observe that for any random graph Q n that is asymptotically indistinguishable from 
a set J-* , to show that J 7 * is minimal, it suffices to show that, for any relatively open 
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non-empty subset Fq C F* such that F* \ Fq is non-empty, there exists e > such that 

liminf -^logF(6 n (g n ,F*\ Fq) > e) = 0. (A.l) 

n— >oo n 

Let Fq C F* be any relatively open non-empty subset, with F*\Fq non-empty. Denote, 
for e > 0, 

F £ = {fe W\ S D (f,F*\F )>s}. 

(i) Since Fq is relatively open in F* , 5n(f,F* \Fq) > for any f E Fq. So, there exists 
an £ > sufficiently small such that (J-" e n Wt)° contains at least one element of J-"o. (A° 
denotes the interior of A) It follows that 

inf [l p (f)] = inf [£,(/)]. 



Since 

from the large deviation principle in [8, Theorem 2.3] implies that 
liminf -\ logP(g n , p G J- E | g n , p G W t ) 

n— s>oo n 

= liminf -1 logP(g n , p £J £ nW()-^ logP(£? n , p G W t ) 
n— >oo n n 

^ - inf [l p (f)] + inf [!„(/)] 
= 0. 

(ii) Since Fq is relatively open in F* , there exists an e > sufficiently small such that 
F° contains at least one element of Fq, and 

inf \%(f) -1(f)] = inf [%(f)-l(f)]. 

Since the Hamiltonian % is bounded, for any r/ > 0, there is a finite set i C R such 
that the intervals {(a, a + r/), a G vl} cover the range of H. Let J 7 " = J 7 ,, n % -1 ([a, a + r/]), 
and let Ff' n = F® Pi be the functions corresponding to a simple finite graph. Then 



a G A aeA 



= n 2 ai Tra,n| 



and 



\ log P(g n G F £ ) > -V? n + sup [a - 4> log |J^' n |]. 
n a&A n z - 

By an observation in [5, Eqn. (3.4)], for any open set U C W, and U n = U n fi n , 

liminf -^loglC/nl ^ - inf 

n->oo n 2 feu 



Then, since 



sup [H(f) - 1(f)] ^ sup [a + V - 1(f)] = a + V - inf [1(f)] 
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we have that 

liminf \ logP(S n G ? e ) > - sup [H(f) - 1(f)] + sup[a - inf [1(f)]] 

n ^°° n few aeA f£(F?)° 

>-sup[W)-X(/)]+sup sup [H(f)-l(f)}- V 

feW aeAfe(F°)° 

^ - sup \U(f) - 1(f)] + sup [H(f) - 1(f)] - V 
few feF° 

= 0. 

The proof is complete. ■ 
Proof of Proposition 2.8. 

Proof. Let e x > be arbitrary. As in Theorem 2.5, let J-** be the set of minimizers of 
mf fedWv ,[l q (f)]. 

EqJT(X) - (v*f\ 

\T(X) - (v*f\ dQ n (X) + f \T(X) - (v*) 3 \ dQ n (X) 

= (I) + (II) 

(We have dropped the superscripts, Q n = Qn"' 3, °.) We estimate the two terms. 

To estimate (I), by [5, Theorem 4.2], there exists C, e 2 > such that for sufficiently 
large n 

Q n (5 n (X,^)>e 1 )^C 2 e- n2 ^. 

Since \T(X) - (v*) 3 \ < 1, 

(I) < Q n (5 a (X,T:«) > e x ) ^ C 2 e~ n ^\ 

To estimate (/I), for any X £ {5n(X, F**) ^ ei}, let the function f x € J 7 *, be such 
that 8u(X,fx) ^ ei. Note that T(f x ) = ( w *) 3 by definition. By Lipschitz continuity of 
the mapping / 1— > T(f) under the cut distance metric 5o [3, Theorem 3.7], 

\T(X) - (v*f\ = \T(X)-T(f* x )\ ^ C x 8 D (X,f x ) ^ C x e x . 

So 

(//)= / \T(X)-(v*f\dQ n (X) 
<CieiQn(<fapW) <d) 

^ Cl€l. 

Hence, 

lim E Qn |T(X) - (v*) 3 | sC lim C 2 e- n2e2 + Ciei = C x e x . 

Since e x is arbitrary, (2.27) follows. 

If (q,v*) belongs to the replica symmetric phase, we have by Theorem 2.5 that J-** 
consists uniquely of the constant function f*(x, y) = v* . Then since £(f*) = v*, the above 
proof follows identically to yield that 

lim E Qn \S{X) - v*\ < lim C 2 e~ n2e2 + Ce x = Ce x . 
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Lemma A. 2. Let S a be defined in Definition 2.10. Then S a D S a i for < a < a' . 
Proof. Denote <j)p{x) = (f) p {x l l 3a ) and let (f>p(x) be the convex minorant of (j)p(x). Then 

<(x) = M* 1/3a ') = M(* a/a ') 1/3a ) = €( xa/a ')- 

Define n(x) = <&f (x a> / a ). Let K be the set where rj(x) = (f>p (x a '/ a ) for x G K. Then 

with equality occurring iff x £ K. (The interpretation of -ftT is that t 3a £ K ii and only if 
(p, t) satisfies the minorant condition with a'.) Since > 1, the function r](x) is convex 
and is less than 4>p(x), hence it must be less than the convex minorant, n(x) ^ <j)p{x). For 
X E K, 

^(x) = n(x)^^(x)^^(x) 

so (x,(pp(x)) lies on the convex minorant 4>p(x) for all x S K. Hence, if (p, t) satisfying 
the minorant condition with a' , then t 3a S K and (t 3a , <t>p{t)) lies on the convex minorant 
</>p(x), implying that (p, t) satisfies the minorant condition with a. 

Now let (p, t) satisfy the minorant condition with a', and suppose that ^- is a subdif- 

ferential of 4>p (x) at the point t 3a ' such that sup[^-u 3a — 4> p (u)] is uniquely maximized at 
t. According to the arguments in the proof of Lemma 2.9, this means that the subtangent 
line 

M*) :=^(x-t 3Q ')-0 P (i) 

lies below 4>p' (x) and touches it at exactly one point t 3a ' . Let v{x) = £ a /(x a ' l a ). We have 
that v{t 3a ) = <t>p{t 3a> ) = <p^{t 3a ) and u'(t 3a ) = ^sL^ct-a) _ Since ^ > 1, v ( x ) i s convex, 
and the line 

Ux) :=^(x-t 3Q )-0 p (t), 

where ^ = z/(t 3Q ), is tangent to u(x) at the point t 3a and lies below For x ^ t 3a , 

u[x) = £ a ,(x a '/ a ) < <j)«'(x a '/ a ) = <j>°(x), 

so i a (x) lies below (j)p(x) and touches it at exactly one point t 3a . Moreover, since v(x) is a 

convex function less than 4>p(x), we have 4>p(x) ^ u(x) ^ £ a (x). So, ^ is a sub differential 

of <^p(ic) and sup[|u 3a — ^> p (u)] is uniquely maximized at t. 

The proof is complete. ■ 

Remark A. 3. We note an interesting connection between the subdifferentials (3' and f3 in 
the above proof. If <p p (t) is differentiable at t, then (2.30) explicitly specifies the relation- 
ship between the subdifferentials: 

= 2^(t)_ = ty p (t) a't 3 *'-i = , 
at 301 - 1 a't 3a '- 1 at 301 ' 1 a 
This is consistent with the derivation in the above proof. 



Proof of Proposition 4.2. 
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Proof. The proof is identical to that of Theorem 2.5 with a few obvious modifications. For 
u £ J, suppose f £ AqH dW u . Similarly to (2.26), 



H(/) - Z!/) ^^ 3 «-X / (/)--log(.l -,,) 



so 



sup [H (/) - 1(f)] = sup sup [H (/) - 1(f)] 



^ sup 

P. 



6 

,*\3a 



/eA n9W u 2 



V*) 



log(l 



where now 



argsup 

ueJ 



inf [2,(/)]--log(l 

feA ndW u 2 



and /* is any function in F** . The supremum supj gj4j ) — 1(f)] is attained on the set 
F**. This concludes the proof of (4.5). The proof of the second part of Proposition 4.2 
follows identically to the proof in Theorem 2.5 and is omitted. ■ 



We collect here some basic asymptotic properties of some probabilities of interest. 
Lemma A. 4. Let Aj be defined in (4.4). 
(i) Suppose p £ J. Let K njPjAj ■= log ExgAj e h p £( - x 1 and k„ iP = 
■4r log X^xefi e hp£ ( x ^ be the normalizing constants for P n ,p,Aj o,nd¥ ntP , respectively. 



Then 

lim K npAj = lim K np = -- 

n— >oo n— >oo 2 



Moreover, ¥ n ^ p (Aj) — > 1 as n — > oo. 
(ii) Assume that t £ J° is in the interior of J. Also assume that 5\j(F*,Aq) > 0, where 
F* C W is the set of minimizers of the LDP rate function infy e vv t [Zp(/)]. Then 

lim -LlogF n JW t nA c j) < - inf [I p (f)}. 

and 

lim \ logP n , p , Aj (Wi) = - inf [X p (/)] = - inf [l p (f)]. 
Proof, (i) From Theorem 4.2, since p = argsupo< u <i[V(u)] and p £ J, 



lim K n n a c = sup[V(ti)l = sup [V(it)l = lim K np 



From (2.1), lim n _ >00 = \ log(l —p). It follows directly that P n:P (Aj) — > 1 as n — > oo. 
(ii) Since 5 n {F*,A c ) > 0, 
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If r is the smallest value larger than t in J , then — inf /gw t nA c [Ip(f)] < ~ m f /eWt [-^p(/)]j 
so 

Jim ^logP niP (W,n^) < -ma X { /e mf A J/ p (/)], ^IMf)]} < 



Also, 



lim \\og¥ n ^ A] {W t )= lim -1 logP n , p (W t n Aj) - lim \ log P„, p (Aj) 



= - inf [X p (/)] = - inf \l p (f)]. 

femnAj feWt 

The last equality follows since t £ J and inf /ew t [%p(f)] = m f/e9W t (Theorem 4.3 

in [8]). ■ 

Proof of Corollary 4.3. 

Proof. Denote 



In, A.i 



(For brevity, we drop the sub/superscripts p,h, (3, a, Aj if no ambiguity arises; thus 
denotes the Gibbs measure Qn' ' and Q n denotes the conditional measure Q ' V , 
similarly for P n , P ra .) Under Q n , the second moment is 



i€,Aj] 



Ep n [lw t nA^] 



Vn(Aj) 

= (P n (A J ))- 1 E P Jl Wtnj4j e" 2 (-«^) + ^^^))]e" 2 ^-^— ^ 

where K n ,p,Aj = ^i^°EYlxeA &~^ £ ^ is the normalizing constant for P n ,p,Aj- 
With /i = Zip, we apply the Laplace principle and Lemma A.4(i), 



1 



lim ^ log Ea [<7n,ylj] 



n— >oo 77, 



inf 

feWtnAj 

inf 

feW t nAj 



Mf) + ^rur + ^s(f) 
Mf) + § r(/r + (/) 



+ lim t/Vj Aj + xlog(l -p) 

n— >-oo 2 

6 /edw t l pU7J 



inf [Mf)}- inf [2p(/)] 



-2 inf [Jp(/)1. 



Hence the importance sampling scheme is asymptotically optimal. ■ 
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